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ABSTRACT 

Efficient predictive models and data analysis techniques for the analysis of photometric and 
spectroscopic observations of galaxies are not only desirable, but required, in view of the over- 
whelming quantities of data becoming available. We present the results of a novel application 
of Bayesian latent variable modelling techniques, where we have formulated a data-driven 
algorithm that allows one to explore the stellar populations of a large sample of galaxies from 
their spectra, without the application of detailed physical models. Our only assumption is that 
the galaxy spectrum can be expressed as a linear superposition of a small number of indepen- 
dent factors, each a spectrum of a stellar sub-population that cannot be individually observed. 
A probabilistic latent variable architecture that explicitly encodes this assumption is then for- 
mulated, and a rigorous Bayesian methodology is employed for solving the inverse modelling 
problem from the available data. A powerful aspect of this method is that it formulates a den- 
sity model of the spectra, based on which we can handle observational errors. Further, we 
can recover missing data both from the original set of spectra which might have incomplete 
spectral coverage of each galaxy, or from previously unseen spectra of the same kind. 

We apply this method to a sample of 21 ultraviolet-to-optical spectra of well-studied 
early-type galaxies, for which we also derive detailed physical models of star formation his- 
tory (i.e. age, metallicity and relative mass fraction of the component stellar populations). We 
also apply it to synthetic spectra made up of two stellar populations, spanning a large range 
of parameters. We apply four different data models, starting from a formulation of principal 
components analysis (PCA), which has been widely used. We explore alternative factor mod- 
els, relaxing the physically unrealistic assumption of Gaussian factors, as well as constraining 
the possibility of negative flux values that are allowed in PCA, and show that other models 
perform equally well or better, while yielding more physically acceptable results. In partic- 
ular, the more physically motivated assumptions of our rectified factor analysis enable it to 
perform better than PCA, and to recover physically meaningful results. 

We find that our data-driven Bayesian modelling allows us to identify those early-type 
galaxies that contain a significant stellar population that is ;£ 1 Gyr old. This experiment also 
concludes that our sample of early-type spectra showed no evidence of more than two ma- 
jor stellar populations differing significantly in age and metallicity. This method will help us 
to search for such young populations in a large ensemble of spectra of early-type galaxies, 
without fitting detailed models, and thereby to study the underlying physical processes gov- 
erning the formation and evolution of early-type galaxies, particularly those leading to the 
suppression of star formation in dense environments. In particular, this method would be a 
very useful tool for automatically discovering various interesting sub-classes of galaxies, for 
example post-starburst, or Eh-A galaxies. 

Key words: galaxies: elliptical and lenticular, cD - galaxies: stellar content - methods: data 
analysis 



1 INTRODUCTION 

The determination of the star formation history of early-type galax- 
ies has important implications for the still-controversial issue of 
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their formation and evolution. Are the bulk of their stars assem- 
bled at high redshift, followed by predominantly passive evolu- 
tion, or are they formed from low-redshift disk-disk merging? 
Or, as it seems likely from the variety of detailed elliptical mor- 
phology which is being revealed via projects such as those in- 
volving integral field spectrographs such as SAURON, does the 
class of ellipticals contain sub-classes, and are these dependent 
on the me thod of their formation, or on their local environment 
iBacon et al. 2001 ; Emsellem et al. 2004)? We would like to know 
how these sub-classes depend on local environment. In particular, 
we wish to know what mechanisms are responsible for the now 
well-known morphology-density relationship {e.g. Dressier 1980), 
which shows that the proportion of early-type galaxies increases as 
the proportion of late-type galaxies decreases in regions with higher 
gala xy density. Furth ermore, the recent interest in E+A galaxies 
(e.g. lTran et fl/.l2003h . in which the signature of a recent (S 1 Gyr) 
starburst is seen, but no significant ongoing star formation, suggests 
that the relation between the evolution of these galaxies and their 
constituent stellar populations is not well-understood. 

However, until recently, only small ensembles of galaxy spec- 
tra, spanning a large useful range, were available and the analysis of 
a large statistical sample of stellar populations of galaxies was not 
possible. With the recently-completed 2dF Galaxy Redshift Sur- 
vey (2dFGRS) and the ongoing Sloan Digital Sky Survey (SDSS) 
or 6dF Galaxy Survey (6dFGS), a few million galaxy spectra will 
soon be available. The detailed physical modelling of stellar popu- 
lations is numerically expensive, so the timely extraction of useful 
knowledge, such as the characteristics of the star formation history 
of galaxies, from these data will largely depend on developing ap- 
propriate data analysis tools that are able to complement more spe- 
cialised astrophysical studies, in particular by automating parts of 
the analyses. Some work has already been carried out using an ap- 
proach which incorporates assumptions fro™ phy sical models to a 
greater or lesser extent {e.g. Heavens et al. 2000;'Kauffmann et al\ 
[2003; Fernandes a/. 2005; Ocvirkera/. 2005a b; Solorioefa/. 

With the increase of data availability, significant research has 
recently been focusing on the use of data-driven methods of as- 
trophysical data analysis. Some of these attempt to analyse galaxy 
properties from spectra, independently from any physical model. 
The usefulness of the application of Principal Component Anal- 
ysis (PCA), a classical multivariate statistical analysis technique, 
has been exploited and demonstrated in various scenarios (see, e.g. 
Connollv ef aZ." 1995';'Folkes. Lahav & Maddox"l996; Ronen et al. 
1999; Madgwick et al. 2003a b). Indeed, PCA is an efficient tool 
for data compression, and as such, suitable for providing a com- 
pressed representation for subsequent physical analysis. In a dif- 
ferent line of research, the auto mated partitioning of spectra based 
on their morphological shapes ( Slonim. S omerville & Lahavl200ll) 
has also been explored with reasonable success. 

Here, we explore the application of a similar technique to a 
different problem, namely that of the data-driven identification of 
the spectra of distinct stellar sub-populations in the spectrum of a 
galaxy. Our only assumption is that the galaxy spectrum can be 
expressed as a linear superposition of a small number of indepen- 
dent factors, each a spectrum of a stellar sub-population that cannot 
be individually observed. We would like to implement an optimal 
feature transformation that would reveal interpretable and mean- 
ingful structural properties (factors) from a large sample of galaxy 
spectra. This would allow fast, automated extraction of some key 
physical properties from potentially very large data sets of the kind 
the Virtual Observatory would offer. 



Preliminary results based on a simple non-orthogonal projec- 
tion method have indicated that an identification of interpretable 
component spectra may be achievable in a purely data-driven man- 
ner (Kaban. Nolan & Ravchaudhurv 2005). In this work, we fol- 
low a more rigorous methodology, explicitly encoding the above 
hypothesis into appropriate probabilistic data models. We then em- 
ploy a Bayesian formalism to solve the inverse modelling problem, 
as well as to objectively assess to what extent such a hypothesis is 
supported by the data, in terms of both physical interpretability and 
the ability for predictive generalisation to previously unseen data 
of the same type. 

The Bayesian framework allows us to automate several func- 
tional requirements within a single statistically sound formal 
framework, where all modelling assumptions may be explicitly en- 
coded. Tasks ranging from inference and prediction to inverse mod- 
elling and data explanation, data compression, as well as model 
comparison are all accomplished in a flexible manner, even in con- 
ditions of missing values and measurement uncertainties. 

In this study we start from a set of synthetic spectra of stel- 
lar populations of kno wn age and me t allicit y from the single stellar 
population models of Ijimenez et all i2004) . We also have assem- 
bled a set of observed ultraviolet (UV)-optical spectra of early-type 
galaxies, to which we have fit two component stellar population 
models, and thus we have some knowledge of their star formation 
history (i.e. ages, metallicities and mass fractions of the component 
stellar populations). We then perform our data-driven Bayesian 
analysis on both synthetic and observed spectra, and compare the 
derived parameters with those obtained from detailed astrophysical 
modelling. 

In §2, we describe the set of 21 observed early-type galaxy 
spectra that we analyse here, and describe the synthetic stellar pop- 
ulation spectra. §3 details the Bayesian modelling of the data. Our 
results are presented in and discussed in §4. In §5 we discuss the 
significance of this work on characterising E+A galaxies. Our con- 
clusions are outlined in §6. Appendix A gives the detailed mathe- 
matical framework required to implement our methodology, specif- 
ically for the data model estimation of the variational Bayesian rec- 
tified factor analysis — a flexible data model for the factorisation 
of positive data that we developed. 



2 THE DATA 

2.1 Synthetic stellar population models 

We perform the linear independent transformation analyses on a 
set of two-component model stellar population spec t ra. We use the 
single stellar population models of Ijimenez et al\ i2004h . which 
have a Salpeter initial mass function, across the same wavelength 
range (2500—8000 A) as the observed spectra, and create a two- 
component model flux, F2pop.x. This is defined as 

F2pop,\ oc {■mifzi,\,ti + ■mjfzj,\,tj) (1) 

where F2pop,\,t is the model flux per unit wavelength in the bin 
centred on wavelength A, which is the sum of the two single- 
metallicity, single-age model fluxes fz^.x.ti and fzj,\,tj, which 
have metallicities and ages, Zi,ti and Zj,tj respectively. Here 
rrii , rrij are the fractional contributions (by stellar mass) of each 
component to the total model population. We set rui + ruj — 1, 
where rrii can take values 0.25 or 0.5. Ages assigned for ti,tj are 
0.03, 1, 3, 7 and 12 Gyr, and the metalhcities are 0.004, O.OI, 0.02, 
0.03 and 0.05, making a total of 50 two-component model spectra. 
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2.2 Observed early-type galaxy spectra 

We have also compiled a set of twenty-one early-type galaxy spec- 
tra from the archives for this study, across the wavelength range 
2500—8000 A, although not all objects have data covering the com- 
plete range. The spectra of the galaxies in the sample are shown 
in Fig.Q As the galaxy sample was assembled from those nearby 
early-type galaxies with both UV and optical spectra, from vari- 
ous archives, it does not represent a statistical sample. The sources 
for the data are listed in TableQ together with the telescopes with 
which the optical spectra were observed. The UV spectra were ob- 
served using the International Ultraviolet Explorer (lUE), except 
for NGC 3605 and NGC 5018, which were observed with the Faint 
Object Spectrograph (FOS) on the Hubble Space Telescope (HST). 

Since these spectra are compiled from sources with varying 
spectral coverage, the resulting data matrix has some missing val- 
ues. The regions of incomplete wavelength coverage are obvious in 
Fig.Q It is one of the strengths of our analysis that these missing 
data regions can be recovered; this is described in i; il3.1.3ll3"4ll3.5l 
Another of the advantages of the probabilistic framework adopted 
here is that, in contrast with classic non-probabilistic methods, the 
observational errors on each data point can be taken into account 

(EH- 

As the data matrix on which the analysis is performed must 
have the same wavelength binning for all the galaxies, the spec- 
tra are de-redshifted and binned to the same wavelength resolution 
as the synthetic stellar population spectra (~20 A; the observed 
spectra have wavelength resolution which varies from ~5-18 A). 
For each galaxy, the various spectral sections, taken in turn, are 
normalised to unity in the overlap regions, and spliced, to create a 
single continuous spectrum for each object. Although the different 
spectral pieces for any one object were observed separately, and 
with different telescopes, it should be noted that the apertures used 
were similar, and therefore each section probes a similar part of 
each galaxy, and the shape of the continua in the overlap regions 
agree well. 



3 DATA MODELLING 

Models are simplifications of reality representing relevant aspects 
while glossing over irrelevant details. Data models are constructed 
from experimental evidence (data) and available prior knowledge. 
Once a data model is estimated (which can be done off-line), it 
offers a tool for making inferences and predictions in a consistent 
framework. 

In Bayesian data modelling, all informatio n is encoded in 
probability distributions iBemardo & Smithll994) . The joint prob- 
ability distribution of the observed data and other unobservable 
variables of interest (such as factors or component stellar popula- 
tions of the representation as well as the parameters of the mixing 
process, etc.) need to be designed. The unobservable variables in 
the model are often referred to as hidden or latent variables. 



3.1 Designing and building an appropriate latent variable 
architecture 

3.1.1 The independent factor representation model of the 
spectrum 

The spectrum of a galaxy or of a stellar sub-population is essen- 
tially a vector, over a binned wavelength range, representing the 
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Figure 1. Rest-frame spectra of the 21 galaxies in our sample. See Table 
Qand i|2|for details. The dotted lines mark some of the absorption features 
in the spectra which are typically strong in young stellar populations, and 
the dashed lines mark some of the absorption features which are typically 
strong in old, metal-rich stellar populations. From left to right, the absorp- 
tion line species are: Mgll (2799 A), He (3970 A), H(5 (4102 A), H7 (4340 
A), H/3 (4861 A), Mgb (5175 A), NaD (5893 A), Ho (6563 A), TiO (7126 
A). The spectra are arbitrarily shifted along the flux axis for the sake of 
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Table 1. Table of references and optical telescopes for the observed early- 
type galaxy spectra. 



Object 


Reference 


Optical telescope 


NGC 0205 


S-BCK 


0.9m, Kitt Peak National Observatory 


NGC 0224 


S-BCK 


0.9m, Kitt Peak National Observatory 


NGC 1052 


S-BCK 


0.9m, Kitt Peak National Observatory 


NGC 1400 


Q 


ESO 1.52m, La Silla 


NGC 1407 


S-BCK 


0.9m, Kitt Peak National Observatory 


IC 1459 


R,Q 


ESO 2.2m, 1.52m, La Silla 


NGC 1553 


R,Q 


ESO 2.2m, 1.52m, La Silla 


NGC 31 15 


R,Q.B 


ESO 2.2m, 1.52m, La Silla 


NGC 3379 


Q 


ESO 1.52m, La Silla 


NGC 3557 


Q,B 


ESO 2.2m, 1.52m, La Silla 


NGC 3605 


P,J 


F.L.Whipple Observatory, 1.5m 


NGC 3904 


R,Q.B 


ESO 2.2m, 1.52m, La Silla 


NGC 3923 


R,Q,B 


ESO 2.2m, 1.52m, La Silla 


NGC 4374 


R,Q 


ESO 2.2m, 1.52m, La Silla 


NGC 4472 


R,Q 


ESO 2.2m, 1.52m, La Silla 


NGC 4621 


Q 


ESO 1.52m, La Silla 


NGC 4697 


R,Q,B 


ESO 2.2m, 1.52m, La Silla 


NGC 5018 


P,R,Q,B 


ESO 2.2m, 1.52m, La Silla 


NGC 5102 


R,Q.B 


ESO 2.2m, 1.52m, La Silla 


NGC 7144 


RQB 


ESO 2.2m, 1.52m, La Silla 


NGC 7252 


V 


1.93m, Haute Provence Observatory 



The UV spectra were observed with the lUE, except for NGC 3605 
and NGC 501 8, which we r e obse rved using EOS on the HST 
Sources: J: Jansenefo/. ( 2000); P: Ponder ero/. (1998); S-BCK: 
^orchi-Bergmann a/. (1998); R: Bica & Alloin, unpublished; Q: 
Imca&AUoiril 11987a); B; Bica & Alloin ( 1987b); V; Bica & Alloin, 
unpublished, from t'tp://cdsarc. u-strasbg.fr/cats/III/2l9/ 



flux per unit wavelength bin. Consider a set of A'^ spectra of early- 
type galaxies, each measured at T different wavelength bins. We 
denote by cct £ TZ'^ the vector formed by the flux values at the 
wavelength bin indexed by t, as measured across the A'^ spectra. 
The N X T matrix formed by these vectors may be referred to as 
X and single elements of this matrix will be denoted as Xnt- Simi- 
lar notational convention will also apply to other variables used. 

As stated above, the hypothesis of our model is that each of 
the A'^ observations can be represented as a superposition of K < 
N latent underlying component spectra or factors f{st) € TZ^ , 
that are not observable by direct measurements but only through 
an unknown linear mapping A G 72.^^^. Formally, this can be 
summarised as follows: 

xt = Af{st) + e. (2) 

In J2}' the first term of the right is the so-called systematic com- 
ponent and e is the noise term or the stochastic component of this 
model, both of which will be further detailed below. The noise term 
e is assumed to be a zero-mean independent and identically dis- 
tributed Gaussian, with variance e~""=" . The diagonal structure of 
the covariance accounts for the notion that all dependencies that 
exist in Xt should be explained by the underlying hidden factors. 

The K components are assumed to be statistically indepen- 
dent, which is a standard assumption of independent factor models. 
The role of the function / here is a technically convenient way of 
imposing certain constraints on the components f{s) in order to fa- 
cilitate the estimation of interpretable factors. This will be detailed 
in subsection 3.1.4. 

As in nearly all data modelling applications of factor models, 
we hope that the feature transformation produced by the mapping 
A will reveal important structural characteristics of the collection 



of spectra. It is part of our data modelling design to investigate 
various ways of facilitating this, as will be detailed in §3.1.4. 

3.1.2 Dealing with measurement errors in the data model 

Classical non-probabilistic data analysis tools do not offer the flex- 
ibility required for taking into consideration the uncertainty that is 
associated with all real measurements. It is thus an important prac- 
tical advantage of the probabilistic framework we adopt here that it 
allows us to take such measurement errors into account as an inte- 
gral part of the data model. This is achieved simply by making the 
'clean' vectors Xt in J2j become hidden variables of the additional 
error model 

Vnt = Xnt + e„t, (3) 

where e„t is a zero-mean Gaussian noise term with known standard 
deviations cjnt, for each individual measurement n = 1 : N,t — 
1 : T. 

3.1.3 Dealing with missing values in the data model 

Our probabilistic framework also allows us to treat missing val- 
ues in a principled and straightforwa rd manner under the assump- 
tion that they are m issing at random jGhahramani & JordarJI 19941 
IChan & Leell2003 ^. Splitting each datum vector into missing and 
observed parts, i.e. yj — {yf, vT"^)' the missing entries simply 
marginalise out: 

pivt) = J p{yt)dyT- (4) 

The superscript 'o' will be omitted hereafter for simplicity, noting 
that unless otherwise stated, y^ will refer to y° and missing entries 
are discounted in all the update equations that follow. 

Having completed the general structural specification of the 
proposed probabilistic model, in the following subsection, we will 
make more specific assumptions about distributional form of the 
latent spectral representation that emerges from such probabilistic 
models. 

3.1.4 The distributional forms employed in the data model 

In order to be able to infer the parameters of the factor model J2j' 
the form of the distributions of the involved variables, which may 
have a key impact on th e subsequent in terpretability of the result- 
ing probabilistic model jMacKavlEool . need to be defined. That 
such assumptions are made explicit is indeed an important feature 
of the Bayesian framework adopted here. An important advantage 
of this approach is that of the several alternative forms that could 
be assumed for these distributional forms (we could call these "hy- 
potheses"), the one best supported by the available data can be de- 
termined automatically from the data evidence. More details can be 
found below, in i )3.3l 

Before proceeding, we note that such assumptions are present 
in all automated analysis techniques, although in the case of non- 
probabilistic methods they are implicit and fixed. In particular, it 
has been shown that the widely used technique of Principal Compo- 
nent Analysis (PC A) implicitly assumes a standard Gaussian rep - 
resentation and isotropic Gaussian noise iTipping & BishoI^1999^ . 
Although PCA ha s been useful in a number of wa ys in astrophysi- 
cal data analysis jGmnolly_eLfliJll99j;|FolkeSj_L ahav & Maddq3 
1 19961 iRonen et a/.lll999l : iMadewick et fl/.ll2003dlbl> . there is no 
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Figure 2. Graphical model of the proposed latent variable architecture (i.e. 
the data model). Nodes represent random variables in the data model, and 
arrows indicate causal relationships between variables. Out of the nodes, 
rectangles refer to observed variables: measured fluxes, y, and known mea- 
surement errors, Vy. Circles and hexagons refer to latent (hidden) variables: 
clean flux, x, along with their variance parameters, Vx, mixing coefficients, 
a, and components, s, along with their component-wise means, nis, and 
variance parameters, Vg). Hexagons indicate the latent variables on the top 
of this hierarchy, for which further hierarchical priors may be considered. 

guarantee that the assumptions it makes are appropriate in our case. 
In particular, the assumption of a Gaussian distribution for the spec- 
tral elements in our present problem would imply that they are 
equally likely to have positive and negative flux values. This may 
make the physical interpretation of the spectra of the individual 
stellar populations (factors) problematic. Indeed, the physical va- 
lidity of the isotropic struct ure of the Gaussian noise imposed by 
PCA has been questioned bv' Madgwick et al\ f2003bh . 

It is therefore worthwhile to consider alternative factor models 
to relax such unrealistic constraints, and to introduce more justifi- 
able constraints (such as the positivity of the factors). This is what 
we do in this section. 

In this study we consider four factor models with differing 
assumptions made regarding the distributional forms employed. 
These are summarised in Table l3T!4l The first three of these re- 
fer to factor models existing in the literature, and the fourth one has 
been recently developed by us (Harva & Kaban 2005). All factor 
models have been implemented in a Bayesian formulation (the so- 
called Variational Bayesian framework, see details in Section lTsl 
in order to ensure a consistent formal framework. The four factor 
models are described below. 

VB-PCA The first row of Table l3. l^describes th e Variational 
Bayesian PCA (VB-PCA) data model (Bishori 1999). Due to the 
widespread use of PCA, we will consider it as the baseline hypothe- 
sis. As already mentioned, it assumes that the spectral factors of the 
representation follow a Gaussian distribution and there is no posi- 
tivity constraint. This is summarised in the columns p(s) and /(s) 
respectively of the first row of Table 13. l"4l Further, it assumes that 
the mixing coefficients of the factors (i.e. the elements of the matrix 
A of equation (2)) follow a standard Gaussian and finally that the 
noise term has an isotropic variance structure (i.e. Vx„ ~ Vx is the 
same for all n). 

VB-FA In Table B.l^ the Variational Bayesian Factor Analy- 
sis (VB-FA) data model l BishoDtl99 9) relaxes the constraint of the 
isotropic structure of the Gaussian noise imposed by PCA. Indeed, 
such a constraint may be physically unjustified iMadgwick et al\ 



Table 2. Distributional assumptions for the various data models 



Model 


p{s) 




p{a) 




VB-PCA 


J\f{s\ms, e~ 




A/'(a|0, 1) 


S Vx 


VB-FA 


J\f{s\ma, e~ 




A/'(a|0, 1) 




VB-PFA 






Ar«(a|0,l) 


S Vx„ 


VB-RFA 


M{s\ms, e~ 






max(0, s) Vx„ 



Data Models: VB-PCA: Variational Bayesian PCA; VB-FA: Variational 
Bayesian Factor Analysis; VB-PFA: Variational Bayesian Positive Fac- 
tor Analysis; VB-RFA: Variational Bayesian Rectified Factor Analysis. 
The columns summarise the assumptions for the distributional form of 
each variable of Eq. JjJ for each of these four data models. Indices have 
been dropped. The columns p{s) and p{a) indicate the form of the prob- 
ability density assumed for the elements of st and A respectively. The 
column /(s) specifies the form of the function /() of Eq. |2j, and Vx„ 
provides the structural form of the noise variance for the various data 
models (distinct variances for all data dimensions n except in the case 
of VB-PCA). 

'2003b"). All other assumptions made are in turn identical with those 
of VB-PCA. 

VB-PFA The above two models do not require the individ- 
ual factors, which are spectra of the stellar sub-populations, to be 
non-negative, which they need to be in order to be physically ac- 
ceptable. The remaining two models considered here do so. The 
third row of Table l3T!4l describes the Variational Bayesi an Posi- 
tive Fa ctor Analysis (VB-PFA), a data model developed bv lMiskiJ 
flOOff), that, in contrast to VB-PCA and VB-FA, imposes the posi- 
tivity constraint. It does so by assuming that each individual spec- 
trum p{s) (factor) of the representation follows a zero-mode rec- 
tified Gaussian distribution (essentially a Gaussian restricted to its 
positive half and renormalised, defined in Appendix A4). VB-PFA 
also restricts the mixing coefficients p{a) to be positive, assumed to 
be standard rectified Gaussians, and maintains the same structural 
form for the noise variance Vx„ as VB-FA. 

The positivity constraint just introduced is meant to fa- 
cilitate the interpretability of the factors as component spec- 
tra. Let us observe, however, that in VB-PFA, the 0— mode 
of p{s) would mean that the recovered components would 
have a low flux close to zero occupying most of the spec- 
trum. Although there has been a great deal of interest in fac- 
tor models that have suc h sparsely distributed factors for a va- 
riety of oth er purposes ( Mi.skin & Mackav 2001; Lee & Seuijd 
| 200lt lllinton & Gha hramani 1997; Socci. Lee & Seung 199^ ^ 
Frev & HintoiJl999h . including image and text related applications 
as well as neurobiological modelling, such a sparsity assumption 
may be restrictive and often unjustified fo r the modelling of stella r 
population spectra. However, as shown in iHarva & Kab^ fe005^■ 
modifying VB-PFA to relax the 0— mode assumption is technically 
impossible. Thus we developed a different way of imposing the 
positivity constraint, which also permits the mode of p(s) to be es- 
timated from the data in a flexible manner. We did not discard the 
hypothesis of VB-PFA either, since it has been applied to stellar 
population spectra in the literature l Miskin.200(l') and we do not 
wish to discard any existing hypotheses based on possibly subjec- 
tive intuitions. 

VB-RFA The last row of Table 13.1.41 describes the Varia- 
tional Bayesian Rectifi ed Factor Analysis (VB-RFA) data model 
jHarva & KabarJ 120051) . where the positivity constraint over the 
factors is ensured through the non-differentiable function /(s) = 
max(0, s). With this solution, the factors are now /(s) and they 
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are guaranteed to be positive. Then the variables s become some 
dummy-arguments of /(), which are now safely modelled as Gaus- 
sians (column p(s) of 4-th row of Table lS.l'!?! . Therefore, both the 
means and variances of these Gaussians can now be estimated from 
the data. The resulting model allows more flexibility regarding the 
shape of the factors compared with VB-PFA, while ensuring the 
positivity of the factors. Finally, the mixing coefficients p(a) are 
assumed to be distributed as standard rectified Gaussians. Note that 
in this context the zero-mode constraint is appropriate since the 
sparsity of the mixing matrix A translates to assuming a smooth 
basis transformation. The technical novelty and c omputational ad- 
vantag es of the VB-RFA approach are detailed in lHarva & KabaiJ 
j200^ . 

As it should now be clear, our scope is neither to apply any 
off-the-shelf data analysis method in an unjustified manner, nor to 
rely on subjective intuitions, as both of these approaches may easily 
lead to biased results. Instead, we have constructed four different 
hypotheses which we consider equally likely a priori. The Bayesian 
framework allows us to decide which hypothesis is best supported 
by the data. 



3.2 The overall model architecture 

The architecture for any of the four data models defined above can 
be graphically represented as in Fig.|2| This is the joint density of 
the observational data and all other variables of interest. Nodes de- 
note random variables, rectangles refer to observed variables (i.e. 
derived from observational data), and circles and hexagons refer to 
unobservable variables that we are hoping to infer from the data 
through the formulated probabilistic model. Hexagons denote hid- 
den variables on the top of the hierarchy^ . Arrows signify genera- 
tive causal dependency relations. 

We now outline the variational Bayesian estimation procedure 
associated with this architecture. 
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Figure 3. Tfie negative log evidence for the variationat Bayesian anafyses of 
the observed spectra, as a function of the number of factors (components), 
i.e. stellar sub-populations. The performance of VB-PCA is clearly inferior. 
VB-RFA has the highest log evidence, with no significant improvement in 
increasing K, the number of components, from two to three. This shows 
that the spectra of early-type galaxies used here can be safely modelled as 
the sum of the spectra of two distinct major stellar components, and the 
inclusion of more components is not required by the data. 



log evidence, based on Jensen's inequality. 



logp(Y) = log / dep{Y,e) = log / deq{Q) 



> / d0g(0)log^^^ = {logp(y,0)),(e)-(logg(0)),(e) 



(5) 



3.3 Estimation of a data model 

Each of the described four data models is estimated using the varia- 
tional Bayesian methodology described in this section. The specific 
expressions obtained for the VB-RFA data model are provided in 
the Appendix. To make the notation concise, we will refer to the 
latent variables by 9 and to the data by Y. Since handling the pos- 
terior di stribution p(0 \ Y) is intractable, we resort to a variational 
schem e /Jord an e; a/.lll999HLapDalaineiJll999l:lRaiko et fl/.ll2005t 
lAttia s 2000), where an approximative distribution q{d) is fitted to 
the true posterior. This is done by constructing a lower bound of the 



In the BayesBlocks based implementation iValpola et q/.|2003|) adopted 
here, a maximal flexibility is ensured by further formulating hierarchi- 
cal priors on all variables rris^, Vs^. and v^^, as follows: p{vx„) ~ 
Mimv^ , e~""i ), with parameters common to all components of the vari- 
able, and these parameters rriy^ and will then have vague priors in 
the form of a zero mean Gaussian having a large value for variance. The 
definition is analogous for rus^. and Vs^.. These additional variables have 
been omitted from the diagram on Fig.|2|for simplicity, as they ai'e part of 
the relatively standard technicalities concerning a full Bayesian approach 
I Raiko et al. 2005j rather than an essential part of the application-specific 
model design. 



where {.)q denotes expectation w.r.t. q. 

This inequality holds for any distribution due to the 
convexity of the logarith m. A fully-factorial app r oximation of 
the po sterior cMacKav. .20031: iLappalaineiJ Il999l: lAttiasI l200(i 
Bisho'El ll999l : iBishop et a/.ll2002t iRaiko et alWlOQ^ will be em- 
ployed here, which is also known to lead to estimation algorithms 
that can be implemented efficiently by using local computations 
only iLappalainen 1999; Raiko ef g/.l boOSt IValpola et fl/.ll2003t 



iBishop et flZ...200 2) The best fully-factorial approximate posterior 
(referred to as the variational posterior) is c omputed by func- 
tional maximisation of l5j> which can be shown I' Lappalainenl 199^ 
[iVIacKav 2003; Jordan e/oZ. 1999 ) to be equivalent to min imising 
the KuUback-Leibler divergence <Cover&Thoma3ll99lh of the 
variational posterior from the true one. The fully-factorial varia- 
tional posterior of the data model in Fig.|2|is simply of the form 

T N N N K 

= n n ^ n ^ n n 

t — l n — 1 n — 1 n — 1 fc— 1 

K T K 

X WW<l{skt)>^Wq{ms^)q{vs^). (6) 



Therefore, Js} further decomposes into one plus twice as many 
terms as there are latent variables in the data model and the optimi- 
sation w.r.t. the individual variables is separately performed. The 
variational posteriors thus obtained, along with the statistics that 
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are required for an implementation'^, are given in Appendix lA 1 I for 
tlie VB-RFA model. Some of these posterior computations are non- 
trivial. Readers interested in more detailed derivations are referred 
to|Harva & Kaban ( 2005). 

The model estimation algorithm then consists of iteratively 
updating each parameter's posterior statistics in turn, while keeping 
all other posterior statistics fixed, until a convergence criterion is 
achieved. For each update, the evidence bound J5j is guaranteed not 
to decrease Jordan et al. ,1999; MacKay 2003) , and convergence 
to a local optimum is guaranteed. It can also be shown that due to 
the fully-factorial posterior approximation all updates are local, 
i.e. requiring posterior statistics of the so called Markov blanket 
only. That is, for updating any of the variable nodes, the posterior 
statistics of only its children, parents and co-parents are needed. 
T his has been exploited in the efficient imp lem entation realised 
in 'Valpol a eTal] j2003h . iBishop et al\ j2002l) and IWinn & Bishod 
(2005), and has also been adopted in this work. The scaling of the 
resulting variational Bayesian estimation algorithm is thus multi- 
linear in the number of factors, observations and wavelength bins 
of the model. These can also be easily followed from the further 
details given in Appendix lAll 

The evidence bound Jsj can be used both for monitoring the 
convergence of the algorithm and, more importantly, for compar- 
ison of different solutions or models. The quantities required for 
computing this are given in Appendix IA2I 



3.4 Using a data model to recover missing elements of the 
data 

We have already stressed the advantage of the probabilistic 
Bayesian framework adopted here in terms of the ability of the re- 
sulting data models to predict and recover incomplete data. Recov- 
ering missing values can be achieved by computing the posterior of 
the missing entries given the observed entries: 

i{y7\yt) = / deq{e)p{yT\e). (7) 

An approximate mean value of this posterior provides the imputa- 
tion for the missing entry. This is discussed in Appendix IA3I 



3.5 Using a data model for recovering missing elements from 
new measurements 

Due to the fully generative nature of the adopted framework, once 
the data model parameters are estimated from a training data set, 
inferences and predictions can be made for new, previously unseen 
measurements. This involves performing the inference for those 
terms of the variational posterior q(6) that are conditional on the 
type of new measurements (new galaxy spectra in the same range 
or various sub-ranges of wavelength for the same galaxies). Using 
these, as well as the remainder of terms of q{6) as estimated from 
the training set, we can then proceed to computing Results will 
be presented in Section 4.3. 



^ All algorithms investigated here have been implemented as pai1 of the 
BayesBlocks software i ValDola el al. 2003; Raiko et al. 2005.) and may be 
made freely available upon publication. 



4 RESULTS 

One of the strengths of the Bayesian evidence framework employed 
in this study is that the selection of the best-supported hypothesis 
can be automatically achieved, in a quantitative manner, based on 
the data evidence. This is known as model selection. 



4.1 Model selection results: how many major stellar 
populations do we need? 

Here we seek a criterion to examine which data model performs 
the best, as well as to find the dimensionality of the representation 
space. We would like to have a compact model (small number of 
factors) both to avoid over-fitting the data, and also for the results to 
be physically interpretable. At the same time, we would like to keep 
an adequate number of factors to guarantee a good approximation 
of the data. Apart from the interpretability issue, the Bayesian ev- 
idence framework can be used to automatically select the optimal 
model and its order from those investigated. Our preferences, mo- 
tivated by the purpose of the application (e.g. data interpretation, 
missing value prediction), expressed as prior belief, would then be 
used to weight the evidence obtained from the data, in order to 
make a final Bayesian decision on the order of the model. 

Fig.|3|shows the negative of the log evidence for the four data 
models (strictly, the negative log evidence bound), versus the model 
order, K (i.e. the number of components), for the observed spec- 
tra (Fig.0. It can be seen that the VB-RFA model outperforms the 
other three models. We also see that increasing the number of fac- 
tors beyond two does not significantly increase the evidence, which 
suggests that the bulk of spectra of an early-type galaxy can be rea- 
sonably well and compactly described by two factors or compo- 
nents, each interpretable as a stellar sub-population. This is useful 
for input into more detailed physical models of early-type galaxy 
spectra. We will also see in i|4.2l that the K = 2 models produce 
two factors that are both physically meaningful, while increasing 
the model order to if = 3 does not bring any further interpretable 
factors. 



4.2 Physical interpretation of the results 

Here, we compare the results of the data-driven method outlined so 
far with an entirely independent determination of the star-formation 
history of our 21 early-type galaxies. 

In this more conventional approach, we have fitted two- 
component detailed physical models, constmcted from the single 
stellar population models of Jimenez et al. 1 2004), to the observed 
long-baseline spectra of our sample, as shown in Fig.Q The ages, 
metallicities and relative mass fractions of the component stellar 
populations are allowed as free parameters. We construct the two- 
component spectra as the linear superposition of the two compo- 
nent spectra 0, and allow ages 0.01-14.0 Gyr, and metallicities 
Z =0.01,0.2, 0.5, 1.0, 1.5,2.5 and 5.0 times the solar value ( Zq). 
Again, the relative mass fractions have to be jointly constrained, 
rrii + rrij = 1, but rrii is allowed to vary from to 1, in steps 
of 0.02. A minimum-x^ fit was used to determine the best-fit val- 
ues of rai , Zi,ti, ruj , Zj and tj . The whole parameter space was 
searched, with the best-fitting parameter values corresponding to 
the point on the agei-agej-Zi-Zj-mi-mj grid with the mi nimum 
calcul ated y^. Details of the fitting process are given in iNolanI 
j2002h and lNolan et ai\ llOO^ . 
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Emitted A / A Emitted A / A 

Figure 4. Analysis of the two-component data model. Top: Synthetic stellar population spectra Ijimenez et a/.l2004l) : Top Left: Spectra of a population of 
age 0.7 Gyr, where metallicity, from bottom to top, is Z =0.2, 1.0 and 2.5 Zq; Top Right: age = 10 Gyr, same metallicities. The dotted Hnes mark some 
of the absorption features in the spectrum which are typically strong in young stellar populations, and the dashed lines mark some of the absorption features 
which are typically strong in old, metal-rich stellar populations. From left to right, the absorption line species are: Mgll (2799 A), He (3970 A), H<5 (4102 
A), H7 (4340 A), H/3 (4861 A), Mgb (5175 A), NaD (5893 A), Ho (6563 A). TiO (7126 A). Middle: The two components found from die various different 
linear independent basis transformation analyses, with the number of components K = 2, of the observed early-type galaxy spectra. The methods are, from 
bottom to top: VB-FA, VB-PFA, VB-PCA, VB-RFA. The recovered spectra are convincingly disentangled into one component (middle left) with young stellar 
population features (Mgll. He, US, H7, H/3, Ha: dotted lines) and shape, and a second (middle right) with the features (Mgb, NaD, TiO: dashed lines) and 
shape of an old, high-metallicity stellar population. Bottom: The two components found from the same analyses, but from the linear superposition of two 
synthetic stellar populations. The various curves refer to the same parameters as in the middle panels. The UV spectrum of the 'young' component recovered 
here rises more steeply than for the same component recovered from the observed spectra, due to the inclusion, in the models, of a very young (0.03 Gyr) 
synthetic stellar population, which is not seen in our observed spectra. 



4.2.1 Analysis of the observed spectra with the two-component 
data model 

In Fig. O synthetic spectra llimenez et a/.IIiOCmI) of young (0.7 
Gyr, left) and old (10 Gyr, right) stellar populations are shown, for 
metallicities 0.2, 1.0 and 2.5 Z0, in each of the upper panels, from 



top to bottom respectively. Below these are shown the spectra of the 
individual components, recovered from the four analysis methods 
described above, for both the observed galaxy spectra (middle pan- 
els), and linear superpositions of pairs of synthetic spectra (bottom 
panels). 
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Figure 5. Scatter-plots showing the con'elation of (top) the age of the 
younger stellar population and (bottom) the mass fraction of the smaller 
stellar population determined from two-component fitting to the observed 
spectra with the weight of the first component (ai) of the VB-PCA (left) 
and VB-RFA (right) linear basis transformation analyses of the observed 
spectra, with K = 2. A high value of ai clearly corresponds to a sub- 
stantial young (~ 1 Gyr) stellar population. The open circles are for those 
galaxies with a secondary population of age less than 1 Gyr These plots also 
show that the worst-perfonning (VB-PCA) and best-performing (VB-RFA) 
algorithms in this case yield almost identical results. 



In general, the similarity between the recovered components, 
and the corresponding model spectra, is striking, for all four meth- 
ods. In the younger components displayed on the left hand of Fig.|5] 
the 4000 A break, typical of a mature (~1 Gyr), is not seen, but 
strong Balmer absorption line features (He (3970 A), HS (4102 
A), H7 (4340 A), H/3 (4861 A)) can be clearly identified. These 
are typical features of the A stars which dominate the emission 
from young (< 1 Gyr) stellar populations. In the second (older) 
component, however, the 4000 A break is apparent, as are other 
features typical of mature, metal-rich stellar populations, e.g. ab- 
sorption features corresponding to Mgb (5175 A), NaD (5893 A) 
and TiO (7126 A). We can rule out either component being repre- 
sentative predominantly of either AGN emission or dust absorption 
as the characteristic shapes and features of these spectra would be 
very different from those we recover. 

However, the VB-PCA algorithm seems to "find" a spuri- 
ous sub-component with younger features (middle left panel) that 
has some features from old, metal-rich stellar populations (notably 
NaD). 

In Fig.|5| we show results for VB-RFA method (right panels), 
which has the the highest log evidence values, together with the 
VB-PCA method (left panels), which has the lowest, for compari- 
son. We plot the correlation between the first ('younger') compo- 
nent (oi) and the age of the younger stellar population (age2, top 
panels) and the mass fraction of the secondary stellar population 
component (m2/Mgai), as determined from the two-component 
model-fitting described above (bottom panels). The open points are 
those galaxies which contain a significantly young stellar popula- 
tion (~ 1 Gyr). 

Fig.|S|plots the results of the Spearman's rank correlation test 
for the weight of the first component ai and the age, metallicity and 
mass fraction of the two fitted stellar populations for the observed 
spectra with K — 2. A value of the Spearman's rank correlation 
coefficient \Rs\ > 0.5 indicates a strong correlation between the 
parameters. It can be seen that there is a strong correlation between 
tti and the ages of the younger stellar populations, and also between 




Figure 6. Spearman's rank correlation coefficient (Rg ) for the first compo- 
nent (ai) of the four linear basis transformation analyses with K = 2, of 
the observed spectra and the best-fitting parameters of the two-component 
synthetic stellar population fits to the data: solid: age of the older popula- 
tion; outline: age of the younger population; light hatched: metallicity of 
the older population; light cross-hatched: metallicity of the younger popu- 
lation; dark hatched: mass fraction of the smaller population; dark cross- 
hatched: mass fraction of the dominant population, mod Rg > 0.5 indi- 
cates a strong correlation. 





Figure 8. Scatter-plots showing the correlation of (top) the age of the 
younger stellar population and (bottom) the mass fraction of the smaller 
stellar population determined from the synthetic spectra fitting with the 
weight of the first component of the VB-PCA and VB-RFA linear basis 
transformation analyses methods (ai), and K = 2. The open circles are for 
those galaxies with a secondary population of age less than 1 Gyr A high 
value of a\ clearly corresponds to a significant young (^ 1 Gyr) stellar 
population. 



tti and the stellar mass fractions (miAfgai, m2/Afgai) of the com- 
ponent stellar populations. The correlation between ai and agei is 
weak, and there is no correlation between ai and the metallicities 
of the component stellar populations. The trends of the correlations 
with a2 follow similar, but opposite, trends to those of ai . This is 
not generally true for factor analysis, but arises here as a conse- 
quence of the nature of our data set. 

The rest-frame spectra of four of the galaxies in our sample, 
together with the reconstructed data model and the two constituent 
data model components are presented in Fig. Q The data model 
convincingly reproduces the observed spectra. 



4.2.2 Analysis of the synthetic spectra with the two-component 
data model 

Here we present the results of the analysis of the two-component 
synthetic spectra, constructed as described in i|2.1l They represent 
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Figure 7. Rest-frame spectra of four of the 21 galaxies in our sample (thick black line), with the reconstructed data model and the two constituent data model 
components (thin lines) superimposed. The data model convincingly reproduces the observed spectra. 




Figure 9. Spearman's rank correlation coefficient (Rg ) for the first compo- 
nent (ai) of the various linear basis transformation analyses with K = 2, 
of the two-component synthetic spectra and the best-fitting parameters of 
the two-component synthetic stellar population fits to the data: solid: age of 
the older population; outline: age of the younger population; light hatched: 
metallicity of the older population; light cross-hatched: metallicity of the 
younger population; dark hatched: mass fraction of the smaller popula- 
tion; dark cross-hatched: mass fraction of the dominant population, mod 
Rs > 0.5 indicates a strong correlation. 



every possible combination of the assigned ages, metallicities and 
relative stellar mass fractions, and thus allow us to investigate a 
wider range of known age and metallicity combinations than the 
observed spectra. It should be noted that not all combinations (for 
example, those with two components < 1 Gyr old) represent what 



we might expect to find in the total population of real early-type 
galaxies, although there could be local regions within these galaxies 
where, for example, young stars dominate the stellar population. 
We wanted to cover the complete range of possibilities in order 
to avoid making any assumptions about the stellar populations of 
early-types, which, as revealed even in our small sample of twenty- 
one spectra, can be quite varied. 

In this experiment, no noise has been added to the synthetic 
spectra. This is because the noise from variations in the spectral 
shape caused by differences in the metallicity is higher than any 
noise contributions which represent measurement errors. 

The bottom panels of Fig. |4| show the spectra of the compo- 
nents recovered using the same four analysis methods as used for 
the observed spectra. Again, the first component has a shape and 
features that are similar to those of a young stellar population, and 
the second component, those of a mature population. The shape of 
the UV part of the spectrum, together with the featureless nature 
of the spectrum above ~5000 A, are typical of very young stellar 
populations (~ a few x 10 Myr). These populations are present in 
the synthetic data set, but not in significant quantities in the ob- 
served data set, hence the differences between the recovered data 
components. 

In Fig. |8| as in Fig. |5| we see that a high value of ai (the 
weight of the 'younger' component from the Bayesian data model) 
corresponds to the presence of a <1 Gyr stellar population. Fig.|9| 
plots the Spearman's rank correlation test results for ai and the age. 
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metallicity and mass fraction of the synthetic populations. Here, 
there is an even stronger correlation between ai and the age of 
the younger population than in the case of the observed spectra, 
and also a strong correlation between ai and the age of the older 
population, but no significant correlation between ai and the mass 
fraction of the secondary population. However, in this exercise with 
synthetic spectra, we allowed only two values for m2 /Afgai (0.25 
and 0.5), which does not really allow for a meaningful determina- 
tion of the correlation coefficient. 

These results confirm that the value of ai is a clear indicator 
of the presence of a significant young stellar population, and is par- 
ticularly unambiguous in discovering components with very young 
(< 1 Gyr) stellar populations, as we found in i|4.2.1l 

4.2.3 Analysis of the three-component data model 

Fig. |3| shows that increasing the number of component spectra in 
the Bayesian models from two to three does not significantly in- 
crease the evidence for any data model. Indeed, for the positive 
factor analysis, the evidence decreases as K increases from two 
to three. These results suggest that the bulk of the stellar content 
of early-type galaxies can be represented by two components, one 
which represents contributions from old, metal-rich stellar popula- 
tions, the other representing contributions from young populations 
(Fig. llOL It is not easy to find a physical interpretation for the third 
component. Indeed, the third component is probably a combination 
of residuals from the sum of the spectra of two populations, com- 
bined with correlated components of observational errors. 

The results of the analysis performed on the observed spectra 
are presented in Fig. ll II The components ai and 02 correlate sig- 
nificantly with the age of the younger population (age2), except for 
the VB-RFA, where the correlation is with az rather than ai. The 
relative mass fraction of the two stellar population components (mi 
and m2) also correlate with a\ and 02, and, for the VB-RFA, with 
03 as well. 

For the synthetic two-population spectra, age2 correlates 
strongly with a\ and 02, for all the analysis methods, and also with 
as for the VB-RFA. However, there is no strong correlation (mod 
Rs > 0.5) for any of the data models with the relative mass frac- 
tion of the two populations, although the metallicity of the younger 
population {Z2) correlates strongly with 03 in the case of the VB- 
PCA and VB-FA methods. 

The differences between the correlation coefficients for the 
observed and synthetic spectra most likely reflect the different dis- 
tributions of age, metallicity and mass fraction in the two spectral 
samples. For example, the observed spectra are all dominated by 
high-metallicity components (2.5 and 5.0 Z©), with small contri- 
butions from very low-metallicity populations (0.01 Z0). Only one 
of the galaxies contains stellar populations with intermediate metal- 
licities. In contrast, the synthesised spectra can take the whole range 
of combinations of the seven allowed metallicities. 



4.2.4 Interpretation of the eigen-components of PCA 

Although we have managed to identify components with PCA here, 
it is a well known property of PCA that the solution it provides 
is rotation-invariant. That is, the eigen-components which have 
equal variances can be multiplied by an arbitrary orthogonal ma- 
trix to produce another equally good solution (in terms of satisfy- 
ing the same objective function). Therefore PCA is not guaranteed 
to achieve a successful separation of the underlying independent 
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Figure 11. Spearman's rank correlation coefficient (Rg ) for the first compo- 
nent (ai ) of the vai'ious linear basis transformation analyses with ii" = 3, of 
the observed spectra and the best-fitting parameters of the two-component 
synthetic stellar population fits to the data: solid: age of the older popula- 
tion; outline: age of the younger population; light hatched: metallicity of 
the older population; light cross-hatched: metallicity of the younger popu- 
lation; dark hatched: mass fraction of the smaller population; dark cross- 
hatched: mass fraction of the dominant population, mod Rg > 0.5 indi- 
cates a strong correlation. 



source signals that we are looking for. Methods that take into ac- 
count higher ord er dependencies need to be used for that purpose 
iHvvarinen. Karhunen & Oia 2001). 

The only 'lucky' case where PCA can identify components 
is if the underlying components have suitably different variances. 
This is true for the work presented here, but there is no reason why 
this would hold for data sets of galaxy spectra in general. Therefore, 
even if PCA has been successful here, it comes with no guarantees. 

Another way of looking at this problem is through the prob- 
abilistic formalism. PCA's implicit assumption (as shown in detail 
in TioDina & BishoD ( 1999)) is that the components it will find are 
Gaussian. Making this assumption explicit, by the probabilistic for- 
malism that we adopted, highlights the lack of plausibility of this 
assumption: the spectra that we are looking for are not Gaussian, 
instead they are positive valued only, and have a rather skewed dis- 
tribution. This renders the use of PCA very problematic in applica- 
tions that require a clear interpretation of the components. 

In turn, by adopting the probabilistic formalism where any as- 
sumption in the data model must be made explicit, we are able to 
specify the distribution that corresponds to components that we are 
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Figure 10. Analysis of the three-component data model. Top panel: Synthetic stellar population spectra f Jimenez et «/."2004'): top \eft: for a stellar population 
of age 0.7 Gyr, with from bottom to top, metallicity Z = 0.2, 1.0 and 2.5 Zq; top right: age = 10 Gyr, same metallicities. The dotted lines mark some of 
the absorption features in the spectrum which are typically strong in young stellar populations, and the dashed lines mark some of the absorption features 
which are typically strong in old, metal-rich stellar populations. From left to right, the absorption line species are: Mgll (2799 A), He (3970 A), H<5 (4102 
A), H7 (4340 A), H/3 (4861 A), Mgb (5175 A), NaD (5893 A), Ha (6563 A), TiO (7126 A). Middle and bottom panels: The 3 components found from the 
various different linear independent basis transformation analyses of the model stellar populations, with K = 'i. The methods are, from bottom to top: VB-FA, 
VB-PFA, VB-PCA, VB-RFA. The first two recovered spectra ai'e broadly identified with the features of a young and old stellar population in the same way as 
the K = 2 components. The third component (bottom) possibly contains 'residuals', resulting from the fact that the first two components are not a complete 
detailed reconstruction of the true stellar population components. 



looking for. Therefore the chances that we can find what we are 
looking for, and not something else, are considerably increased. 
The ultimate conclusion is of course made based on visual inspec- 
tion of the recovered components, made by human experts. 

4.3 The prediction of missing values 

As a more objective criterion for the assessment of our data mod- 
elling compared to the results obtained by the fitting of detailed 
spectral models, here we investigate the predictive ability of the 
modelling approach when faced with missing elements of the spec- 
tra. 

We begin with demonstrating the recovered values for the 
missing elements along with their predicted uncertainties (poste- 
rior variances), as obtained from VB-RFA on the small real data 
set, compared with those from the best-fitting physical model. This 
is shown on the scatter plot in Fig. ll2l The correlation coefficient 
between the posterior mean predictions of the data model and the 
values imputed from the best-fitting physical model is 0.989, the 
SNR being 9.3. This is a strong indication that the data-driven anal- 
ysis and the physical analysis are well matched. 



However, in the case of the real data, the true values of the 
missing entries are unknown. Therefore, in the following section, 
we employ synthetic data to perform a more thorough and con- 
trolled objective evaluation of the predictive capabilities of our 
probabilistic data model model. Missing entries were simulated, 
and the true values were used for evaluation only. The synthetic 
data set was more diverse than the real one, as a variety of parame- 
ter combinations that can be simulated, in this process, may not ap- 
pear in real spectra. Therefore, a wealth of situations were created 
on which the generalisation abilities of the proposed method were 
tested. A global and fixed noise level of a'^t ~ 0.01 was used in 
the data models in these experiments . We studied the prediction 
performance for both two and three component models {K — 2, 
K — 3) on new, previously unseen measurements, in the two sce- 
narios as reported below. In this setting, the evidence bound peaked 
at K — 3, as can be seen in Figures fT3l and fT4l 



^ Here, ant can be used to both control the compactness of the represen- 
tation and to simulate a level of measurement uncertainly similar to that 
encountered in real data. 
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Figure 12. Predictions for the missing values provided by the VB-RFA 
model versus the physical model. Values of the x-radii of the ellipses rep- 
resent errors (one standard deviation) obtained from data imputation using 
synthetic physical models. The y-radius values correspond to one standard 
deviation of the posteriors of x, as obtained using VB-RFA. There are no 
missing values in the observed spectra in the central region of the wave- 
length range considered here, which accounts for the gap in the plotted data. 

4.3.1 Predictions for previously unseen spectra in the same 
wavelength range 

As the spectra of more and more galaxies become available and 
are added to public archives, the ability to make predictions for 
previously unseen spectra, that have a small percentage of missing 
values within the same overall wavelength range, is an important 
practical requirement. 

It is therefore desirable to be able to infer reliable estimates of 
missing values, based on a few measurements and the probabilis- 
tic representation already created from previously seen spectra. In 
this scenario, Fig.^|shows the prediction results obtained by VB- 
RFA when varying the percentage of missing entries in the test set. 
The predictions are surprisingly accurate up to a high percentage of 
missing entries among the previously unseen spectra. Effectively, 
the high redundancy across the spectra, due to the fact that mature 
galaxies are very similar to each other, makes this scenario rela- 
tively easy to handle. 

4.3.2 Predictions for previously unseen wavelength bins 

The second scenario investigated here is when the predictions are 
to be made for missing elements for previously unseen values of 
wavelength, albeit within the same overall wavelength range. 

This is a more difficult task, since the flux varies significantly 
across different wavelength ranges. We conducted experiments in 
which the overall wavelength range for each spectrum was split 
randomly into training sets (regions without missing values in our 
case, though they need not be so) and test sets, and each model 
was trained on the training set. Then, to e nsure t hat we retain the 
missing-at-random (Ghahramani & Jorda3 ll99'4l) assumption, for 
randomly picked ranges in the test set, "missing" values were pre- 
dicted based on the trained model and the prediction errors were 
averaged over all missing values. This procedure was repeated 10 
times with 50 different combinations of training and test sets for 
each spectrum. 

The results are shown in Fig. 1141 The percentage of missing 
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Figure 13. Predictive performance from previously unseen spectra using a 
VB-RFA model with K = 2 (bottom) and X = 3 (top), plotted against the 
percentage of missing entries from the test-set. 
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Figure 14. Predictive performance from previously unseen wavelength bins 
using a VB-RFA model with K = 2 (bottom) and = 3 (top). As before, 
the X-axis indicates the percentage of missing entries from the test-set. 

values was varied between experiments as is indicated on the x axis. 
With a complete training set, the predictions are still pretty accurate 
even when up to 60% of the bins from the test set are missing. 



5 FINDING E+A GALAXIES 

So far we established generic data-driven procedures for finding ev- 
idence of more than one stellar sub-populations, significantly dif- 
fering in age and metallicity-related characteristic features, in the 
integrated spectra of galaxies. Here we show an application to a par- 
ticular data mining project which would benefit from this method, 
once large sets of galaxy spectra are available for public use. 

The determination of the star formation history of early-type 
galaxies has important implications for the much-debated issue of 
their formation and evolution. Observational evidence suggests that 
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at least some elliptical galaxies already ha ve the bulk of their stars 
at high redshift (z > 2) (e.^. |Renzini 199i lNolan ef fl/.l2003l) . but 
in hierarchical structure formation models, ellipticals can form at 
lower red shift from late-type merging (e.g. White & Rees 1978; 
kauffma nn. White. & Guiderdonil fl993l) . A vital link in under- 
standing how early-type evolution is driven, and under what en- 
vironmental conditions, is the study of E+A galaxies. 

The spectra of these galaxies are characterised by strong hy- 
drogen Balmer absorption-lines together with a lack of significant 
[Oil] 3272 A emission, implying a recent (SI Gyr) starburst, but 
no significant on-going star formation. The overall shape of the 
spectral energy distribution of these galaxies is that expected for 
an early-type galaxy. They may represent an intermediate stage be- 
tween disc-dominated rotationally supported systems and spheroid- 
dominated, pressure-supported systems. 

The physical mechanisms responsible for the triggering and 
subsequent truncation of star-formation in E+A galaxies are cur- 
rently unclear. They may be the result of disk-disk merging (or 
galaxy interactions), or cluster-related phenomenon, e.g. ram- 
pressure stripping, tidal stripping. However, although at interme- 
diate redshift, it has been suggested that E+A-type galaxies are 
prevalent in clusters (e.g. Tran et al. 2003), which argues for a 
cluster-related mechanism, at low-redshift {z ~ 0.1), E+As occur 
in lower-density environments, which is more indicative of merging 
or interaction processes ( Zabludoff ef a/. 1996; Blake ef a/. 2004). 
Statistical studies of the environments, luminosities and detailed 
morphologies of these galaxies, across a range of redshifts, are nec- 
essary in order to uncover the physical processes governing their 
star-formation history. 

The possible relationship between early-type evolution and 
E+A galaxies has not as yet been well-studied, as E+A systems 
are rare (< 1 percent of the overall zero-redshift galaxy population, 
IZabludoff et al\ il996h ). However, recent and on-going large-scale 
surveys such as the SDSS and the 2dFGRS offer the opportunity to 
study these galaxies in a statistically meaningful way, if they can 
be reliably identified. 

In this paper we have developed a rapid, automated method 
which allows us to identify those early-type galaxies containing 
a significant young (< 1 Gyr old) population; these galaxies are 
E+A galaxies. Fig. 1151 shows a typical E+A galaxy spectrum, 
together with its constituent population spectra. Unsurprisingly, 
given the diagnostic abilities of our components analyses, the com- 
ponent spectra look very similar to the spectra recovered from the 
compon ents analyses. 

iFormiggini & BroschI i2004h have analysed the lUE UV spec- 
tra of a sample of normal galaxies using PCA. One of their com- 
ponents was associated with a young stellar population, and an- 
other with a mature population, consistent with our results. How- 
ever, they were unable to correlate the principal components with 
the optical morphology of the galaxies. As we have prior knowl- 
edge of the component stellar populations in our galaxy sample, 
we have been able to make a more quantitative interpretation of our 
results: if the value of a\ (the weight of the 'young' component) 
for a particular early-type galaxy is > 0.5, there is a significant < 1 
Gyr-old stellar population present in that galaxy. 

Hence, our components analysis is able to rapidly identify 
post-starburst galaxies, without the uncertainties inherent in the 
measurement of the equivalent widths of the Balmer absorption 
and [Oil] emission lines. [Oil] emission in disk-dominated galax- 
ies may suffer from dust obscuration, and H7 and H/3 are subject 
to emission-filling. These, together with the high signal-to-noise re- 
quirements, introduce significant uncertainties in the measurement 



of these lines. In contrast, as our method utilises all the information 
present in the long-baseline spectra i.e. both the features present 
and the overall shape of the spectra, it is a more robust diagnostic 
of recent star-forming activity. 

Using the wealth of data available in the SDSS and the 2dF- 
GRS, with signal-to-noise as good or better than the data we 
analyse here, we may map precisely in what environments star- 
formation is suppressed (e.g. in groups or clusters), by rapidly and 
robustly identifying those galaxies hosting only mature (>1 Gyr) 
stellar populations. This allows us to investigate whether galaxy 
transformation (from star-forming to passive galaxies) occurs via 
e.g. ram-pressure stripping, tidal stripping, harassment or suppres- 
sion of galaxy-galaxy interactions. Although the wavelength ranges 
of the SDSS and 2dF spectra are shorter than those we have used, 
we can use a training set of data to identify the linear components 
(as suggested below), which can then be applied to the SDSS and 
2dF data. 

With the higher-resolution spectra of these surveys, 
and higher-resolution m odel spectra that are available (e.g. 
jBruzual & Charlolll2003h ). we expect the accuracy of our statis- 
tical estimates to increase, as we have more independent samples 
(wavelength bins) of the mixed components. Whether or not the 
number of components, K, would increase would depend on the 
data set used. Both the existence of additional components and 
the interpretation of any further components can only be explored 
by carrying out the experiment. However, we would not expect 
the increase in information that we obtain from higher-resolution 
spectra to lead to a decrease in K. 

We intend to extend our Bayesian modelling technique to 
identify other classes of galaxies, by using appropriately selected 
training sets of galaxy spectra and wavelength ranges. For exam- 
ple, starburst galaxies may be identified by the Wolf-Rayet 'bump' 
around ~4600-4900 A, which is a characteristic feature of the 
emission from the Wolf-Rayet stars in starburst galaxies. By first 
applying the Bayesian analysis to an appropriate training set of 
spectra, which include the spectral range of this feature, identifi- 
cation of these galaxies in a large sample should be possible. 



6 CONCLUSIONS 

Using rigorous Bayesian modelling, we have successfully devel- 
oped a time-efficient method for the analysis of the spectra of 
early-type galaxies. At ~a few seconds per galaxy for this anal- 
ysis, its speed compares very well with work by other authors (e.g. 
iHeavens et al .l200(t iFemandes eTaHl OOS ) . By comparing the re- 
sults of this analysis with the results of our unique two-component 
synthetic stellar population fitting to the long-baseline data, and 
with two-component synthetic spectra, we have shown that the in- 
dependent components analysis is physically interpretable. 

Two linear components are sufficient to describe the bulk of 
the stellar content of early- type galaxies. One of the components 
represents a young (<1 Gyr) stellar population, and the other an 
old, metal-rich population. The relative contribution of these two 
components allows us to robustly identify which early-type galax- 
ies host significant young stellar populations i.e. which are post- 
starburst galaxies. 

Of the four variational Bayesian methods investigated here. 
We find the variational Bayesian rectified factor analysis to be the 
most useful, since it has the greatest flexibility in the shape of the 
allowed distribution, whilst retaining the astronomically motivated 
requirement that the flux values in each component spectrum are 
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current datasets that include UV spectra, to work on much larger, 
solely optical datasets. 

In future studies, we therefore intend to: 

• estimate the data models for the 3700—8000 A range; 

• apply our technique to 2dF, 6dF, SDSS catalogued spectra, to 
identify E+A galaxies and their environments; 

• experiment with identifying starburst galaxies; 

• consider the robustness of our data models with respect to un- 
certainties in the calibration and response of the 2dF/6dF/SDSS fi- 
bres, and the effect of random continuum errors, via analyses of 
simulated data with additional noise factors; 

• extend the models to deal with stochastic censoring, i.e. non- 
ignorable (versus random) missing data. 



in 




o — < — ' — >— J — ' — ^— ' — I — 
3500 4000 4500 



Emitted A / A 

Figure 15. Top: the optical spectrum of an A3V star; second from top: spec- 
trum of a 12 Gyr stellar population; third from top: combined spectrum of 
the A star plus 12 Gyr stellar population: a typical E-|-A spectrum. Bottom: 
The observed spectrum of NGC 7252, from our sample, which contains a 
very young < 1 Gyr stellar population. 



non-negative. This method achieved the highest evidence and the 
highest correlation between the component weights and the popu- 
lation parameters determined from spectral fitting. The method al- 
lows the observational errors to be included in the formulation, and 
has the additional powerful attribute that it can be used to recover 
missing regions of spectral coverage. The strength of the correla- 
tion between the posterior mean predictions of the data model, and 
the values imputed from the best-fitting physical model, is a strong 
indication that the data-driven analysis and the physical analysis 
are well matched. 

We intend to apply our analysis to the early-type galaxies, in 
the 2dF, 6dF and SDSS catalogues, as a useful tool to investigate 
their evolution with envirormient and redshift. We have shown that 
in addition to efficiently finding early-type galaxies with significant 
young stellar populations, this method would also be an efficient 
tool to find other distinctive sub-classes, such as E+A galaxies, 
which is an important missing link in current population scenarios 
of the formation of early-type galaxies. 

In this work, we use model and observed spectra between 
2500-8000 A. However, to be useful as a data mining tool for z ~ 
galsixies in the SDSS, 2dF and 6dF archives, the Bayesian analysis 
has to work efficiently in the 3700-8000 A range equally well. Our 
next step will be to estimate data models that will learn, from the 
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APPENDIX A: IMPLEMENTATION DETAILS OF THE VB-RFA DATA MODEL 
Al Data model estimation 

Al.l Posterior updates for q{skt) 

The following simplifying notations will be used in this subsection: s = Sfct, nis = {ms^),Vs = (e^^fc )~^. Further, the following expres- 
sions, involving variables from the Markov blanket (MacKay 2003) of a variable s, will be required: Vx = |X^^=i(e"'"" ){anfc)| ™d 

X = |l]^=i{e''°'"){anfc) (^{xnt) — I]j^fc{anj){/(sjt)))}- Using these notations, as well as expressing /(s) = u{s)s, where u(s) is 
the step function, then the outline of the computation of the free form posterior q{s) and its sufficient statistics can be given as follows: 

q{s) = ^f^{x\u{s)s,Vx)f^{s\ms,Vs) (Al) 

= ^™u(3)A/'(s|s„(s),s„(s)) (A2) 

where 

Su{s) = {u{s)/vx + 1/Vs}~^ (A3) 

Su{s) = s {u{s)x/vx + nis/vs} (A4) 

w^^s) = J^ix\u{s)ms,u(s)vs + Vx) (A5) 

/OO 
dsW.u(s)M'is\Su(s), Su(s)) 
-OO 

1 

= ^ w„ erfc I (-l)"s„/^2i;:} . (A6) 

For more detailed derivation, see lHarva&Kabii]<2005h . 

Note the non-trivial variational posterior (A2) that resulted from the free-form approximation, a bimodal distribution (A6), which is 
essentially a mixture of two rectified Gaussians. 

The posterior statistics associated with q{s) that are required for updating the posterior statistics of other variables of the model are the 
first and second order moments, as follows. Denoting — erfc { (— l)"s„/V2su} (which is the erfc-term in <A6» , the i-th order moments 
of s and /(s) can be written as 

if is)) = J ris)qis)ds = M,{u = 1), (A7) 

where 
and 

{s') = ^Mi(M). (A8) 



1 



u=0 



The notation Mi{u) introduced in (A7) will be used also in <A1 1> . The remaining integral is the i-th moment of a rectified Gaussian density. 
The exact expressions for these, are given in i| A4I and more details on it can be found e.g. in iHarval i2004) . along with numerical issues 
concerning an efficient implementation. 



Al.l Posterior updates for q[msf, ) 

The posteriors q(ms^ ) = Nljris^ i'^sfc i "^-si- ) of 3 Gaussian form, where 



which provide the required posterior expectation (rris^) — nis^. Note that if a 'vague prior' was used on nis^,, then the posterior mean 
estimate (ms^,) becomes similar to a maximum likelihood estimate ^ J {sfct)/r, as then {m„i^ ) — = and (e"""" ) — e^"^' ~ 0. 
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A1.3 Posterior updates for q{a„t) 

The posteriors q{a„k) = JV^{ank\ank, ank) are rectified Gaussians, wiiere 

= |l + {e"-)^{/^(s«))| 

flnfc =anfc{e""")^ {X„t) -^{an3){fiSjt)) {/(Sfet)) 

t=i y j/fc y 

wilicli provide tile required sufficient statistics; (a^fc) = / al^i^Af^{a„k\a„k,a„k)da„k- 
A1.4 Posterior updates for q{xnt) 

The posteriors qixnt) ~ A/'(a;,it|x„t, i„t) are Gaussian, where 



These provide the required posterior expectation {x„t) — x„t- Here, the indicator variable bnt = 1 if ynt is observed and zero if it is missing. 
A1.5 Posterior updates for variance parameters 

FolIowing lRaiko et al\ i20d^ and lValpola. Harva & K arhunen" ("20041). in the exp-parameterisation (see §4.1.1), Gaussian priors are imposed 
on ve, where 6 stands for any of the variables a;, s or m in Fig.|2| This is then solved by a fixed form Gaussian variational posterior, 

q{v0) =M{vg\v0,Vg), (A9) 

and this has been employed in the experiments described in thi s paper. 

Here, vg and vg are computed by numerical optimisation I'Ra iko et a/.l2005tlValpola. Harva & Karhunenl2004h . Alternatively, a direct 
parameterisation may be used with Gamma priors and Gamma posteriors may be sufficient for this model. 

A2 The log evidence bound 

Computing the log of the evidence bound is useful for monitoring the convergence of the iterative estimation process, and more importantly, 
for comparing different modelling hypotheses. As mentioned, jS) decomposes into a number of terms. 

logp(Y) > Y,{\ogp{y„t\x„t)) + {{\ogp{e\ pa(e))) - (logg(e))} . (AlO) 
nt see 

The p and q terms for Gaussians will be given only once, while those that are different will be detailed separately. 
A2.1 p- and q-terms of Gaussian variables 

The p and q terms in (AlO), in the case of Gaussian variables, can be written as: 

{\ogp{ynt\xn,t,G'it)) = ^ | "S" " {xnt)f + Var{x„t}] + log 27ra^t I 
{logp(e|me,e-"'')) = -\{{e') [{9 - {mg)f + + Vs.Y{mg}\ - (i;^) + log 27r} 
-{logg(0)) = i + ilog2^e, 

where 6 and 9 are the posterior mean and v ariance of 6. If the exp-parameterisation is used for the variances, then the term (e"" ) = e 
— see lValpola. Harva & KarhunerJ <2004 . 

A2.2 p- and q-terms of Onk 

The p and q terms of a„fc, which are the elements of the matrix A in (Al), are given by: 

{logAA^(a|0, 1)) = -i {log27r + {a)2 + Var{a}} +log2 

12 

-(logg(a)) = — {Var{a} + ((a) - af] - - log — + log erfc(-a/\/2a), 



vi)+!ig/2 



where we have dropped the indices. 
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A2.3 The q-term of Skt 

The p-term of variable s is computed as in the case of other Gaussian variables. The g-term gives the following: 



ti=0 

For detailed derivation see lHarva&Kabir]<2005h . 



s2 

— log 



2s ° 



Mo{u)-^Mi{u) + ^M2{u)} . (All) 

Su 



A3 Missing value prediction 

A Gaussian approximation of the predictive distribution can be obtained as follows. 

qiVnAvl) = j d0p{yZ\0)q{0) (A12) 

d0_^„t / dx„tp{yn.t\Xnt)p{Xnt\0-Xr,t)'l{(^-x^t) (A13) 



fc 

^ AA(y™,|^(a„fc){/(s«)),<^^;", + {e'^-r') (A15) 

k 

where 0~x„t denotes hidden variables above x^t in the conditional dependency architecture of the model. The mean of this distribution 
provides a prediction for the missing entry ynt ■ Conveniently, for missing entries the mean of <A15> is identical to that of q{xnt) (as fe„t ~ 0), 
so that the imputation of missing values takes no additional computation. Further, evaluating <A15t when a^^^ — (i.e. no measurement of 
errors) - in order to test the prediction performance - also happens to be identical to evaluating q{x„t) from (A15) and §A1.4. 

A4 Required properties of A/'^(6ij^,^) 

The rectified Gaussian distribution along with its first and second order moments, used in § A 1.1 and § A 1.3, are listed here for completeness. 

j^"{e\e, §) = _^ u{e)N{e\e, e) (ai6) 



(9) = / 9N^(9\9,9)d9 = 9 + ' 
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{9^) = j 9^N^(9\9,9)d9 = 9^ + 9 + 
See lHarvj j2004) or lMiskirJ J2OO0I) for details. 



exp|(e/\/26i)2| erfc(-6'/\/26i) 
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^ exp|(6'/\/26i)2|erfc(-6l/\/2^) 



